4 research outputs found

    Developing a comprehensive framework for multimodal feature extraction

    Full text link
    Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. In a world where nearly every new service has its own API, documentation, and/or client library, data scientists who need to combine diverse features obtained from multiple sources are often forced to write and maintain ever more elaborate feature extraction pipelines. To address this challenge, we introduce a new open-source framework for comprehensive multimodal feature extraction. Pliers is an open-source Python package that supports standardized annotation of diverse data types (video, images, audio, and text), and is expressly with both ease-of-use and extensibility in mind. Users can apply a wide range of pre-existing feature extraction tools to their data in just a few lines of Python code, and can also easily add their own custom extractors by writing modular classes. A graph-based API enables rapid development of complex feature extraction pipelines that output results in a single, standardized format. We describe the package's architecture, detail its major advantages over previous feature extraction toolboxes, and use a sample application to a large functional MRI dataset to illustrate how pliers can significantly reduce the time and effort required to construct sophisticated feature extraction workflows while increasing code clarity and maintainability

    Neural information retrieval: at the end of the early years

    Get PDF
    A recent "third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research

    PyBIDS: Python tools for BIDS datasets

    No full text
    Brain imaging researchers regularly work with large, heterogeneous, high-dimensional datasets. Historically, researchers have dealt with this complexity idiosyncratically, with every lab or individual implementing their own preprocessing and analysis procedures. The resulting lack of field-wide standards has severely limited reproducibility and data sharing and reuse.To address this problem, we and others recently introduced the Brain Imaging Data Standard (BIDS; (Gorgolewski et al., 2016)), a specification meant to standardize the process of representing brain imaging data. BIDS is deliberately designed with adoption in mind; it adheres to a user-focused philosophy that prioritizes common use cases and discourages complexity. By successfully encouraging a large and ever-growing subset of the community to adopt a common standard for naming and organizing files, BIDS has made it much easier for researchers to share, reuse, and process their data (Gorgolewski et al., 2017).The ability to efficiently develop high-quality spec-compliant applications itself depends to a large extent on the availability of good tooling. Because many operations recur widely across diverse contexts—for example, almost every tool designed to work with BIDS datasets involves regular file-filtering operations—there is a strong incentive to develop utility libraries that provide common functionality via a standardized, simple API.PyBIDS is a Python package that makes it easier to work with BIDS datasets. In principle, its scope includes virtually any functionality that is likely to be of general use when working with BIDS datasets (i.e., that is not specific to one narrow context). At present, its core and most widely used module supports simple and flexible querying and manipulation of BIDS datasets. PyBIDS makes it easy for researchers and developers working in Python to search for BIDS files by keywords and/or metadata; to consolidate and retrieve file-associated metadata spread out across multiple levels of a BIDS hierarchy; to construct BIDS-valid path names for new files; and to validate projects against the BIDS specification, among other applications.In addition to this core functionality, PyBIDS also contains an ever-growing set of modules that support additional capabilities meant to keep up with the evolution and expansion of the BIDS specification itself. Currently, PyBIDS includes tools for (1) reading and manipulating data contained in various BIDS-defined files (e.g., physiological recordings, event files, or participant-level variables); (2) constructing design matrices and contrasts that support the new BIDS-StatsModel specification (for machine-readable representation of fMRI statistical models); and (3) automated generation of partial Methods sections for inclusion in publications.PyBIDS can be easily installed on all platforms via pip (pip install pybids), though currently it is not officially supported on Windows. The package has few dependencies outside of standard Python numerical and image analysis libraries (i.e., numpy, scipy, pandas, and NiBabel). The core API is deliberately kept minimalistic: nearly all interactions with PyBIDS functionality occur through a core BIDSLayout object initialized by passing in a path to a BIDS dataset. For most applications, no custom configuration should be required.Although technically still in alpha release, PyBIDS is already being used both as a dependency in dozens of other open-source brain imaging packages –e.g., fMRIPrep (Esteban et al.,2019), MRIQC (Esteban et al., 2017), datalad-neuroimaging (https://github.com/datalad/datalad-neuroimaging), and fitlins (https://github.com/poldracklab/fitlins) – and directly in many researchers’ custom Python workflows. Development is extremely active, with bug fixes and new features continually being added (https://github.com/bids-standard/pybids), and major releases occurring approximately every 6 months. As of this writing, 29 people have contributed code to PyBIDS, and many more have provided feedback and testing. The API is relatively stable, and documentation and testing standards follow established norms for open-source scientific software. We encourage members of the brain imaging community currently working in Python to try using PyBIDS, and welcome new contributions
    corecore